Stock Price PredictionΒΆ

This notebook demonstrates stock price forecasting using ARIMA modeling with historical data from Bajaj Finserv.

InΒ [1]:
# Installing
!pip install plotly
Requirement already satisfied: plotly in c:\users\user\anaconda3\lib\site-packages (5.24.1)
Requirement already satisfied: tenacity>=6.2.0 in c:\users\user\anaconda3\lib\site-packages (from plotly) (9.0.0)
Requirement already satisfied: packaging in c:\users\user\anaconda3\lib\site-packages (from plotly) (24.2)

1. SetupΒΆ

We begin by installing and importing the required libraries.

InΒ [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import pmdarima
import plotly.graph_objects as go

from pmdarima import auto_arima
from scipy import stats
InΒ [3]:
# Data source
dataframe = pd.read_csv(r"C:\Users\User\Downloads\BAJAJFINSV.csv")

2. Load the DatasetΒΆ

We load the CSV file containing stock price data.

InΒ [4]:
# Overview
dataframe.head()
Out[4]:
Date Symbol Series Prev Close Open High Low Last Close VWAP Volume Turnover Trades Deliverable Volume %Deliverble
0 2008-05-26 BAJAJFINSV EQ 2101.05 600.00 619.00 501.0 505.1 509.10 548.85 3145446 1.726368e+14 NaN 908264 0.2888
1 2008-05-27 BAJAJFINSV EQ 509.10 505.00 610.95 491.1 564.0 554.65 572.15 4349144 2.488370e+14 NaN 677627 0.1558
2 2008-05-28 BAJAJFINSV EQ 554.65 564.00 665.60 564.0 643.0 640.95 618.37 4588759 2.837530e+14 NaN 774895 0.1689
3 2008-05-29 BAJAJFINSV EQ 640.95 656.65 703.00 608.0 634.5 632.40 659.60 4522302 2.982921e+14 NaN 1006161 0.2225
4 2008-05-30 BAJAJFINSV EQ 632.40 642.40 668.00 588.3 647.0 644.00 636.41 3057669 1.945929e+14 NaN 462832 0.1514
InΒ [5]:
# Setting index
dataframe.set_index('Date',inplace=True)

3. Initial Data ExplorationΒΆ

Let’s examine the first few rows and summary statistics.

InΒ [6]:
# Describe
dataframe.describe()
Out[6]:
Prev Close Open High Low Last Close VWAP Volume Turnover Trades Deliverable Volume %Deliverble
count 3201.000000 3201.000000 3201.000000 3201.000000 3201.000000 3201.000000 3201.000000 3.201000e+03 3.201000e+03 2456.000000 3.201000e+03 3201.000000
mean 2755.864386 2760.382381 2803.614449 2716.731443 2758.781537 2758.657451 2761.156954 2.315312e+05 9.533424e+13 20892.811075 7.409510e+04 0.471614
std 2869.811765 2874.814173 2912.885262 2834.037357 2873.792614 2873.522615 2874.033545 4.402681e+05 2.176448e+14 32396.302068 1.464012e+05 0.218910
min 90.750000 88.150000 93.100000 88.150000 91.000000 90.750000 89.260000 4.570000e+02 1.376712e+10 149.000000 4.560000e+02 0.056200
25% 527.900000 528.600000 542.600000 520.000000 527.950000 527.900000 531.270000 3.981100e+04 2.751053e+12 2951.750000 2.086300e+04 0.287400
50% 1098.700000 1095.000000 1118.000000 1080.250000 1100.000000 1098.700000 1103.560000 9.995300e+04 1.090486e+13 9450.000000 4.159700e+04 0.469700
75% 5121.900000 5120.000000 5199.800000 5042.800000 5115.000000 5125.100000 5127.510000 2.315400e+05 8.755946e+13 24439.750000 8.308900e+04 0.636000
max 11176.550000 11000.000000 11300.000000 10868.700000 11175.450000 11176.550000 11081.780000 6.271671e+06 3.394379e+15 312959.000000 3.804696e+06 1.000000
InΒ [7]:
# Data cleaning
# 1- check for null
dataframe.isnull().sum()
Out[7]:
Symbol                  0
Series                  0
Prev Close              0
Open                    0
High                    0
Low                     0
Last                    0
Close                   0
VWAP                    0
Volume                  0
Turnover                0
Trades                745
Deliverable Volume      0
%Deliverble             0
dtype: int64
InΒ [8]:
dataframe["Trades"].isna()
Out[8]:
Date
2008-05-26     True
2008-05-27     True
2008-05-28     True
2008-05-29     True
2008-05-30     True
              ...  
2021-04-26    False
2021-04-27    False
2021-04-28    False
2021-04-29    False
2021-04-30    False
Name: Trades, Length: 3201, dtype: bool

4. Data PreprocessingΒΆ

We clean the dataset, convert dates, and check for missing values.

InΒ [9]:
dataframe[dataframe["Trades"].isna()]
Out[9]:
Symbol Series Prev Close Open High Low Last Close VWAP Volume Turnover Trades Deliverable Volume %Deliverble
Date
2008-05-26 BAJAJFINSV EQ 2101.05 600.00 619.00 501.00 505.10 509.10 548.85 3145446 1.726368e+14 NaN 908264 0.2888
2008-05-27 BAJAJFINSV EQ 509.10 505.00 610.95 491.10 564.00 554.65 572.15 4349144 2.488370e+14 NaN 677627 0.1558
2008-05-28 BAJAJFINSV EQ 554.65 564.00 665.60 564.00 643.00 640.95 618.37 4588759 2.837530e+14 NaN 774895 0.1689
2008-05-29 BAJAJFINSV EQ 640.95 656.65 703.00 608.00 634.50 632.40 659.60 4522302 2.982921e+14 NaN 1006161 0.2225
2008-05-30 BAJAJFINSV EQ 632.40 642.40 668.00 588.30 647.00 644.00 636.41 3057669 1.945929e+14 NaN 462832 0.1514
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2011-05-25 BAJAJFINSV EQ 490.50 485.00 498.50 485.00 489.00 485.95 491.58 68329 3.358895e+12 NaN 17938 0.2625
2011-05-26 BAJAJFINSV EQ 485.95 489.90 491.40 482.20 485.40 484.70 486.95 27605 1.344235e+12 NaN 8579 0.3108
2011-05-27 BAJAJFINSV EQ 484.70 485.65 492.00 484.05 486.30 486.90 487.88 35212 1.717919e+12 NaN 11239 0.3192
2011-05-30 BAJAJFINSV EQ 486.90 491.85 498.00 490.30 493.10 493.70 493.78 43441 2.145038e+12 NaN 13254 0.3051
2011-05-31 BAJAJFINSV EQ 493.70 495.05 521.50 494.00 517.25 518.40 513.67 327035 1.679887e+13 NaN 67729 0.2071

745 rows Γ— 14 columns

InΒ [10]:
dataframe = dataframe.drop(columns=["Trades"])
InΒ [11]:
# 2- Check for duplicate
dataframe.duplicated().sum()
Out[11]:
np.int64(0)
InΒ [12]:
dataframe.columns
Out[12]:
Index(['Symbol', 'Series', 'Prev Close', 'Open', 'High', 'Low', 'Last',
       'Close', 'VWAP', 'Volume', 'Turnover', 'Deliverable Volume',
       '%Deliverble'],
      dtype='object')
InΒ [13]:
# Plotting
dataframe["VWAP"].plot(figsize=(15,5))
Out[13]:
<Axes: xlabel='Date'>
No description has been provided for this image

5. Exploratory Data AnalysisΒΆ

We visualize trends and check stationarity of the time series.

InΒ [14]:
sns.histplot(data = dataframe["VWAP"])
Out[14]:
<Axes: xlabel='VWAP', ylabel='Count'>
No description has been provided for this image
InΒ [15]:
sns.kdeplot(data = dataframe["VWAP"], fill = True)
Out[15]:
<Axes: xlabel='VWAP', ylabel='Density'>
No description has been provided for this image
InΒ [16]:
stats.probplot(x = dataframe["VWAP"], plot = plt)
Out[16]:
((array([-3.51908695, -3.27647555, -3.14237236, ...,  3.14237236,
          3.27647555,  3.51908695]),
  array([   89.26,    93.99,    94.79, ..., 10486.75, 10980.4 , 11081.78])),
 (np.float64(2583.530410402808),
  np.float64(2761.156954076851),
  np.float64(0.8981685068494004)))
No description has been provided for this image
InΒ [17]:
cols = ['High', 'Low', 'Last',
       'Close']
dataframe[cols].plot(figsize=(15,5), subplots = True)
Out[17]:
array([<Axes: xlabel='Date'>, <Axes: xlabel='Date'>,
       <Axes: xlabel='Date'>, <Axes: xlabel='Date'>], dtype=object)
No description has been provided for this image
InΒ [18]:
go.Figure(data=[go.Candlestick(x=dataframe.index[0:50],
                                 open=dataframe['Open'][0:50],
                                 close=dataframe['Close'][0:50],
                                 high=dataframe['High'][0:50],
                                 low=dataframe['Low'][0:50])])

6. ARIMA Model TrainingΒΆ

We fit an ARIMA model using pmdarima.auto_arima.

InΒ [19]:
# Rolling
lag_features = ['High', 'Low','Volume', 'Turnover']
dataframe[lag_features].head()
Out[19]:
High Low Volume Turnover
Date
2008-05-26 619.00 501.0 3145446 1.726368e+14
2008-05-27 610.95 491.1 4349144 2.488370e+14
2008-05-28 665.60 564.0 4588759 2.837530e+14
2008-05-29 703.00 608.0 4522302 2.982921e+14
2008-05-30 668.00 588.3 3057669 1.945929e+14
InΒ [20]:
for cols in lag_features:
  dataframe[cols+"window_mean_3"] = dataframe[cols].rolling(window=3).mean()
  dataframe[cols+"window_mean_7"] = dataframe[cols].rolling(window=7).mean()
  dataframe[cols+"window_std_3"] = dataframe[cols].rolling(window=3).std()
  dataframe[cols+"window_std_7"] = dataframe[cols].rolling(window=7).std()

dataframe.head()
Out[20]:
Symbol Series Prev Close Open High Low Last Close VWAP Volume ... Lowwindow_std_3 Lowwindow_std_7 Volumewindow_mean_3 Volumewindow_mean_7 Volumewindow_std_3 Volumewindow_std_7 Turnoverwindow_mean_3 Turnoverwindow_mean_7 Turnoverwindow_std_3 Turnoverwindow_std_7
Date
2008-05-26 BAJAJFINSV EQ 2101.05 600.00 619.00 501.0 505.1 509.10 548.85 3145446 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2008-05-27 BAJAJFINSV EQ 509.10 505.00 610.95 491.1 564.0 554.65 572.15 4349144 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2008-05-28 BAJAJFINSV EQ 554.65 564.00 665.60 564.0 643.0 640.95 618.37 4588759 ... 39.542003 NaN 4.027783e+06 NaN 773461.552524 NaN 2.350756e+14 NaN 5.682195e+13 NaN
2008-05-29 BAJAJFINSV EQ 640.95 656.65 703.00 608.0 634.5 632.40 659.60 4522302 ... 59.042386 NaN 4.486735e+06 NaN 123703.660710 NaN 2.769607e+14 NaN 2.541759e+13 NaN
2008-05-30 BAJAJFINSV EQ 632.40 642.40 668.00 588.3 647.0 644.00 636.41 3057669 ... 22.040039 NaN 4.056243e+06 NaN 865428.886510 NaN 2.588793e+14 NaN 5.614629e+13 NaN

5 rows Γ— 29 columns

InΒ [21]:
# Check for missing value
dataframe.isnull().sum()
Out[21]:
Symbol                   0
Series                   0
Prev Close               0
Open                     0
High                     0
Low                      0
Last                     0
Close                    0
VWAP                     0
Volume                   0
Turnover                 0
Deliverable Volume       0
%Deliverble              0
Highwindow_mean_3        2
Highwindow_mean_7        6
Highwindow_std_3         2
Highwindow_std_7         6
Lowwindow_mean_3         2
Lowwindow_mean_7         6
Lowwindow_std_3          2
Lowwindow_std_7          6
Volumewindow_mean_3      2
Volumewindow_mean_7      6
Volumewindow_std_3       2
Volumewindow_std_7       6
Turnoverwindow_mean_3    2
Turnoverwindow_mean_7    6
Turnoverwindow_std_3     2
Turnoverwindow_std_7     6
dtype: int64
InΒ [22]:
dataframe.dropna(inplace=True)
dataframe.isnull().sum()
Out[22]:
Symbol                   0
Series                   0
Prev Close               0
Open                     0
High                     0
Low                      0
Last                     0
Close                    0
VWAP                     0
Volume                   0
Turnover                 0
Deliverable Volume       0
%Deliverble              0
Highwindow_mean_3        0
Highwindow_mean_7        0
Highwindow_std_3         0
Highwindow_std_7         0
Lowwindow_mean_3         0
Lowwindow_mean_7         0
Lowwindow_std_3          0
Lowwindow_std_7          0
Volumewindow_mean_3      0
Volumewindow_mean_7      0
Volumewindow_std_3       0
Volumewindow_std_7       0
Turnoverwindow_mean_3    0
Turnoverwindow_mean_7    0
Turnoverwindow_std_3     0
Turnoverwindow_std_7     0
dtype: int64
InΒ [23]:
ind_features = ['Highwindow_mean_3', 'Highwindow_mean_7',
       'Highwindow_std_3', 'Highwindow_std_7', 'Lowwindow_mean_3',
       'Lowwindow_mean_7', 'Lowwindow_std_3', 'Lowwindow_std_7',
       'Volumewindow_mean_3', 'Volumewindow_mean_7', 'Volumewindow_std_3',
       'Volumewindow_std_7', 'Turnoverwindow_mean_3', 'Turnoverwindow_mean_7',
       'Turnoverwindow_std_3', 'Turnoverwindow_std_7']
InΒ [24]:
# Check for no of rows
# First 2400 for training, and the next 2400 for model testing
dataframe.shape
Out[24]:
(3195, 29)
InΒ [25]:
# Data training
training_data = dataframe[0:2400]
testing_data = dataframe[2400:]

7. ForecastingΒΆ

We forecast future stock prices using the fitted model.

InΒ [26]:
import warnings
warnings.filterwarnings('ignore')
InΒ [27]:
model = auto_arima(y=training_data['VWAP'], X=training_data[ind_features], trace=True)
Performing stepwise search to minimize aic
 ARIMA(2,0,2)(0,0,0)[0] intercept   : AIC=21501.827, Time=3.42 sec
 ARIMA(0,0,0)(0,0,0)[0] intercept   : AIC=22553.316, Time=1.53 sec
 ARIMA(1,0,0)(0,0,0)[0] intercept   : AIC=21962.075, Time=1.62 sec
 ARIMA(0,0,1)(0,0,0)[0] intercept   : AIC=21621.879, Time=2.75 sec
 ARIMA(0,0,0)(0,0,0)[0]             : AIC=38832.057, Time=1.45 sec
 ARIMA(1,0,2)(0,0,0)[0] intercept   : AIC=21603.205, Time=3.25 sec
 ARIMA(2,0,1)(0,0,0)[0] intercept   : AIC=21569.177, Time=3.02 sec
 ARIMA(3,0,2)(0,0,0)[0] intercept   : AIC=21504.408, Time=3.43 sec
 ARIMA(2,0,3)(0,0,0)[0] intercept   : AIC=21496.743, Time=3.65 sec
 ARIMA(1,0,3)(0,0,0)[0] intercept   : AIC=21507.703, Time=3.41 sec
 ARIMA(3,0,3)(0,0,0)[0] intercept   : AIC=21498.531, Time=4.05 sec
 ARIMA(2,0,4)(0,0,0)[0] intercept   : AIC=21493.517, Time=3.97 sec
 ARIMA(1,0,4)(0,0,0)[0] intercept   : AIC=21504.634, Time=3.70 sec
 ARIMA(3,0,4)(0,0,0)[0] intercept   : AIC=21484.775, Time=4.44 sec
 ARIMA(4,0,4)(0,0,0)[0] intercept   : AIC=21489.653, Time=6.02 sec
 ARIMA(3,0,5)(0,0,0)[0] intercept   : AIC=21490.528, Time=4.20 sec
 ARIMA(2,0,5)(0,0,0)[0] intercept   : AIC=21488.259, Time=4.03 sec
 ARIMA(4,0,3)(0,0,0)[0] intercept   : AIC=21488.441, Time=3.99 sec
 ARIMA(4,0,5)(0,0,0)[0] intercept   : AIC=21494.072, Time=4.56 sec
 ARIMA(3,0,4)(0,0,0)[0]             : AIC=21482.774, Time=3.92 sec
 ARIMA(2,0,4)(0,0,0)[0]             : AIC=21491.514, Time=3.53 sec
 ARIMA(3,0,3)(0,0,0)[0]             : AIC=21496.532, Time=2.52 sec
 ARIMA(4,0,4)(0,0,0)[0]             : AIC=21487.653, Time=3.02 sec
 ARIMA(3,0,5)(0,0,0)[0]             : AIC=21488.528, Time=2.87 sec
 ARIMA(2,0,3)(0,0,0)[0]             : AIC=21494.744, Time=2.32 sec
 ARIMA(2,0,5)(0,0,0)[0]             : AIC=21486.260, Time=2.92 sec
 ARIMA(4,0,3)(0,0,0)[0]             : AIC=21486.442, Time=2.46 sec
 ARIMA(4,0,5)(0,0,0)[0]             : AIC=21492.073, Time=3.04 sec

Best model:  ARIMA(3,0,4)(0,0,0)[0]          
Total fit time: 93.151 seconds
InΒ [28]:
# See the model
model
Out[28]:
 ARIMA(3,0,4)(0,0,0)[0]          
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
 ARIMA(3,0,4)(0,0,0)[0]          

8. Visualization of ForecastΒΆ

We plot the original time series along with forecasted values.

InΒ [29]:
# Model predict
forecast = model.predict(n_periods = len(testing_data), X = testing_data[ind_features])
forecast
Out[29]:
2400     5062.237053
2401     5067.151485
2402     5140.069665
2403     5181.652603
2404     5206.024391
            ...     
3190     9986.682752
3191    10045.087341
3192    10286.942142
3193    10784.870960
3194    11150.087691
Length: 795, dtype: float64
InΒ [32]:
# Model testing
testing_data['Forecast_ARIMA'] = forecast.values

testing_data[['VWAP', 'Forecast_ARIMA']].head()
testing_data[['VWAP', 'Forecast_ARIMA']].plot(figsize=(15,5))
Out[32]:
<Axes: xlabel='Date'>
No description has been provided for this image

9. ConclusionΒΆ

This notebook demonstrates a simple workflow for time series forecasting of stock prices using ARIMA.